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Abstract. Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result 
shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic 
variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently 
' established a natural extension of Freedman's inequality that provides tail bounds for the maximum 

' singular value of a matrix-valued martingale. This note describes a different proof of the matrix 

Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument 
delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other 
types of matrix martingales. The new techniques are adapted from recent work [TrolOb] by the 
present author. 
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1. An Introduction to Freedman's Inequality 



(-H I The Freedman inequality |Fre751 Thm. (1.6)] is a martingale extension of the Bernstein inequality. 

This result demonstrates that a martingale exhibits normal-type concentration near its mean value 
on a scale determined by the predictable quadratic variation, and the upper tail has Poisson-type 
decay on a scale determined by a uniform bound on the difference sequence. 

Oliveira [OlilOl Thm. 1.2] proves that Freedman's inequality extends, in a certain form, to the 
matrix setting. The purpose of this note is to demonstrate that the methods from the author's 
Q"^ ' paper |TrolOb| can be used to establish a sharper version of the matrix Freedman inequality. 

. Furthermore, this approach offers a transparent way to obtain other probability inequalities for 

adapted sequences. 

Let us introduce some notation and background on martingales so that we can state Freedman's 
original result rigorously. Afterward, we continue with a statement of our main results and a 
presentation of the methods that we need to prove the matrix generalization. 

1.1. Martingales. Let (r2,^,P) be a probability space, and let C .^i C =^2 C • • • C =^ be 
- ^ , a filtration of the master sigma algebra. We write for the expectation conditioned on . A 

^ ' martingale is a (real-valued) random process {1^ : A; = 0, 1, 2, . . . } that is adapted to the filtration 

^ . and that satisfies two properties: 

Ek-iYk = Yk-.i and E|yfcj<+oo for A: = 1, 2, 3, . . . . 

For simplicity, we assume that the initial value of a martingale is null: Yq = 0. The difference 
sequence is the random process defined by 

Xk = Yk-Yk.i for /c = 1,2,3,.... 

Roughly, the present value of a martingale depends only on the past values, and the martingale 
has the status quo property: today, on average, is the same as yesterday. 
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1.2. Preedman's Inequality. Freedman uses a powerful stopping-time argument to establish the 
following theorem for scalar martingales |Fre751 Thm. (1.6)]. 

Theorem 1.1 (Freedman). Consider a real-valued martingale {Y^ : k = 0,1,2, ... } with difference 
sequence {X^ : A; = 1, 2, 3, . . . }. Assume that the difference sequence is uniformly hounded: 

Xk < R almost surely for k = 1, 2, 3, ... . 

Define the predictable quadratic variation process of the martingale: 

Wk := Y!^^, ^j-i (^1) fork = 1,2,3,.... 
Then, for all t > and > 0, 

F{3k>0:Yk>t and W^fc < } < exp | - -^^^-^ | . 

When the difference sequence {Xf^} consists of independent random variables, the predictable 
quadratic variation is no longer random. In this case, Freedman's inequality reduces to the usual 
Bernstein inequality |Lug09 Thm. 6]. 

1.3. Matrix Martingales. Matrix martingales are defined in much the same manner as scalar 
martingales. Consider a random process {1^ : /c = 0, 1, 2, . . . } whose values are matrices of finite 
dimension. We say that the process is a matrix martingale when 

Efc.iYfc, = yfc_i and E||yfc||<+oo for A; = 1, 2, 3, . . . . 

We write ||-|| for the spectral norm, which coincides with the operator norm between Hilbert spaces. 
As before, we assume that Yq = 0, and we define the difference sequence {X^ : A; = 1, 2, 3, . . . } via 
the relation 

Xk = Yk- Yk-i for A; = 1, 2, 3, 

A matrix- valued random process is a martingale if and only if we obtain a scalar martingale when 
we track each fixed coordinate in time. 

1.4. Freedman's Inequality for Matrices. In the elegant paper [01ilO| . Oliveira establishes 
that it is possible to extend Freedman's inequality to the matrix setting. He studies martingales 
that take self-adjoint matrix values, and he shows that the maximum eigenvalue of the martingale 
satisfies a result very similar to Freedman's inequality. The uniform bound R and the predictable 
quadratic variation {VFfc} are replaced by natural noncommutative extensions. As a consequence, 
these results have powerful applications in random matrix theory. 

In this note, we establish a sharper version of Oliveira's theorem [OlilOl Thm. 1.2]. 

Theorem 1.2 (Matrix Freedman). Consider a matrix martingale {Y^ : k = 0,1,2,...} whose 
values are self-adjoint matrices with dimension d, and let {X^ : A; = 1, 2, 3, . . . } he the difference 
sequence. Assume that the difference sequence is uniformly hounded in the sense that 

Amax(^/t) ^ R almost surely for A; = 1, 2, 3, ... . 

Define the predictable quadratic variation process of the martingale: 

'■= E -=1 ^i-i (^') • fork = 1,2,3,.... 
Then, for all t > and cr^ > 0, 

¥{3k>0: X^^{Yk)>t and \\Wk\\ < a^} < d ■ exp i^- -^^^j^Y 
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Here and elsewhere, Amax denotes the algebraically largest eigenvalue of a self-adjoint matrix, 
and ll'll denotes the spectral norm, which returns the largest singular value of a matrix. 

Theorem 11.21 offers several concrete improvements over Oliveira's original work. His theo- 
rem |Olil01 Thm. 1.2] requires a stronger uniform bound of the form \\Xk\\ < R, and the constants 
in his inequality are somewhat larger (but still very reasonable). 

We prove Theorem 11.21 in Section [3] as a consequence of a stronger probability inequality that 
follows from a general result for adapted sequences of matrices. These tail bounds cannot be 
sharpened without changing their structure; see |TrolObl §4 and §6] for a more detailed discussion. 

As an immediate corollary of Theorem II. 2| we obtain a result for rectangular matrices. 

Corollary 1.3 (Rectangular Matrix Freedman). Consider a matrix martingale {Yk : k = 0,1,2, ... } 
whose values are matrices with dimension di x d2, and let {Xi^. : A; = 1, 2, 3, . . . } be the difference 
sequence. Assume that the difference sequence is uniformly bounded: 

ll^fcll ^ ^ almost surely for k = 1,2,3, ... . 
Define two predictable quadratic variation processes for this martingale: 



Weoi, k := J]'^^ {XjX;) and 
W,ow, k := J]'^^ Ej-i {X*Xj) fork = 1,2,3,.... 

Then, for all t > and a"^ > 0, 
F{3k>0: llFfcll >t and max{\\W,oikh\\Wro^,k\\} < < (di + da) • exp | - ^^^^ ^^^^ | 

Proof Sketch. Define a self-adjoint matrix martingale {Z^} with dimension d = di + d2 via 

Yk 
Y* 

Apply Theorem 1 1.2 1 to this martingale. See |TrolOb[ §2.6 and §4.2] for some additional details about 
this type of argument. □ 

1.5. Tools and Techniques. In his paper jOlilOj . Oliveira describes a way to transport Freed- 
man's stopping-time argument to the matrix setting. The main technical obstacle is to control the 
evolution of the moment generating function (mgf ) of the matrix martingale. Oliveira accomplishes 
this task using an insightful variation on a idea due to Ahlswede and Winter |AW021 App.]. This 
method, however, does not result in the sharpest bounds on the matrix mgf. 

This note demonstrates that the ideas from [TrolObj allow us to obtain the sharp estimates for 
the mgf with minimal effort. Our main tool is a deep theorem |Lie73l Thm. 6] of Lieb. 

Theorem 1.4 (Lieb, 1973). Fix a self-adjoint matrix H . The function 

A I — > trexp(i? + log(A)) 

is concave on the positive- definite cone. 

See [TrolObl §3.3] and [TrolOaj for some additional discussion of this result. We apply Theorem ll.4l 
through the following simple corollary [TrolObl Cor. 3.2]. We include a proof for completeness. 

Corollary 1.5 (Tropp, 2010). Let H be a fixed self-adjoint matrix, and let X be a random self- 
adjoint matrix. Then 

Etrexp(ii'-hX) < tr exp(ii' -h log(E e^)). 
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Proof. Define the random matrix Y = e-^ , and calculate that 

E tr exp(i? + X) = E tr exp(i? + log(l")) < tr exp(i3" + log(E 1^)) = tr exp(i3" + log(E e^)). 

The first identity follows because the logarithm can be defined as the functional inverse of the 
matrix exponential. Lieb's result, Theorem II. 41 establishes that the trace function is concave in Y, 
so we may invoke Jensen's inequality to draw the expectation inside the logarithm. □ 

A significant advantage of our point of view is that the proof extends in a transparent way to 
yield other types of probability inequalities for adapted sequence of random matrices. We have 
dilated on this observation in a preliminary version of this work that is now available as a technical 
report [Troll] . Here, for brevity, we focus on proving Freedman's inequality. 

2. Tail Bounds via Martingale Methods 

In this section, we show that Freedman's techniques extend to the matrix setting with minor 
(but profound) changes. The key idea is to use Corollary 11.51 to control the evolution of a matrix 
version of the moment generating function. This argument culminates in a rather general theorem 
on the large deviation behavior of an adapted sequence of random matrices. In ^ we specialize 
this result to obtain Freedman's inequality. 

2.1. Additional Terminology. We say that a sequence {X^} of random matrices is adapted to 
the filtration when each is measurable with respect to Loosely speaking, an adapted 
sequence is one where the present depends only upon the past. We say that a sequence {14} of 
random matrices is previsible when each Vfc is measurable with respect to ^k-i- In particular, 
the sequence {Ek-i X^} of conditional expectations of an adapted sequence {X^} is previsible. A 
stopping time is a random variable k : ^> Nq U {oo} that satisfies 

{K<k}c^k for A; = 0,1,2, ...,oo. 

In words, we can determine if the stopping time has arrived from current and past experience. 

2.2. The Large Deviation Supermartingale. Consider an adapted random process {Xk '■ k = 
1,2,3,...} and a previsible random process {V^ : = 1, 2, 3, . . . } whose values are self-adjoing ma- 
trices with dimension d. Suppose that the two processes are connected through a relation of the 
form 

logEfc_i e^^*^ ^ g{e) ■ Vk for 6 > 0, (2.1) 

where the function g : (0, oo) — )> [0,oo]. The left-hand side should be interpreted as a conditional 
cumulant generating function (cgf); see |TrolOb[ Sec. 3.1]. It is convenient to introduce the partial 
sums of the original process and the partial sums of the conditional cgf bounds: 

lo := and Y^ := ^ X,. 

Wo :=0 and Wk :=Y,'^^Vj. 

The random matrix can be viewed as a measure of the total variability of the process {X^.} 
up to time k. The partial sum Y^. is unlikely to be large unless is also large. 

To continue, we fix the function g and a positive number 6. Define a real- valued function with 
two self-adjoint matrix arguments: 

Ge{Y, W) := tr exp (^1^ - g{e) ■ W) . 

We use the function Gg to construct a real-valued random process. 



Sk := Sk{9) = Ge{Yk, Wk) for A: = 0, 1, 2, 



(2.2) 
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This process is an evolving measure of the discrepancy between the partial sum process {Y/^} and 
the cumulant sum process {W^}. The following lemma describes the key properties of this random 
sequence. In particular, the average discrepancy decreases with time. 

Lemma 2.1. For each fixed 9 > 0, the random process {Sk{9) : /c = 0, 1, 2, . . . } defined in ()2.2p is 
a positive supermartingale whose initial value Sq = d. 

Proof. It is easily seen that 5^ is positive because the exponential of a self-adjoint matrix is positive 
definite, and the trace of a positive-definite matrix is positive. We obtain the initial value from a 
short calculation: 

Sq = trexp {OYq - g{9) ■ Wq) = trexp(O) = tr I = d 
To prove that the process is a supermartingale, we ascend a short chain of inequalities. 
Efc^i Sk = Efc^i trexp {9Yk^i - g{9) ■ Wk + 9Xk) 

< trexp (^9Yk-i - g{9) ■ Wk + logE^.i e^^'^) 

< tr exp {9Yk^^ - g{9) ■ Wk + g{9) ■ Vk) 
= tiei,^{9Yk^i-g{9)-Wk^i) 

= Sk-i- 

In the second line, we invoke Corollarv ll.51 conditional on ^k-i- This act is legal because and 
Wk are both measurable with respect to ^k-i- The next inequality depends on the assumption (|2.ip 
together with the fact that the trace exponential is monotone with respect to the semidefinite 
order |Pet94l §2.2]. The last step follows because {W^} is the sequence of partial sums of {Vk}- □ 

Finally, we present a simple inequality for the function Gq that holds when we have control on 
the eigenvalues of its arguments. 

Lemma 2.2. Suppose that Amax(^) ^ t and that Amax(^) ^ w. For each 9 > 0, 

Ge{Y,W) > e^*-9W-"'. 
Proof. Recall that g{9) > 0. The bound results from a straightforward calculation: 
Ge{Y,W) = tre^^-f(^)-^ > tr e^^-^^^)'-^ > A^ax (^e'^~9{e) -^^^ = eSX^MY)~9{0yn. > ^et-gieyy.^ 

The first inequality depends on the semidefinite relation W ^ wl and the monotonicity of the trace 
exponential with respect to the semidefinite order |Pet941 §2.2]. The second inequality relies on the 
fact that the trace of a psd matrix is at least as large as its maximum eigenvalue. The third identity 
follows from the spectral mapping theorem and elementary properties of the maximum eigenvalue 
map. □ 

2.3. A Tail Bound for Adapted Sequences. Our key theorem for adapted sequences provides 
a bound on the probability that the partial sum of a matrix-valued random process is large. In 
the next section, we apply this result to establish a stronger version of Theorem 11.21 This result 
also allows us to develop other types of probability inequalities for adapted sequences of random 
matrices; see the technical report [Troll] for additional details. 

Theorem 2.3 (Master Tail Bound for Adapted Sequences). Consider an adapted sequence {Xk} 
and a previsible sequence {Vk} of self-adjoint matrices with dimension d. Assume these sequences 
satisfy the relations 

logEfc_i e^"'^'' ^ g{9) ■ Vk almost surely for each 9 > 0, (2.3) 
where the function g : (0, oo) [0,oo]. In particular, the hypothesis (j2.3p holds when 

Ek-i e^^" 4 eS^^'^-^" almost surely for each 9 > 0. (2.4) 
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Define the partial sum processes 

Then, for all t,w ^M., 

F{3k > : Xnr.AYk) > t and K^U^k) <w}<d-\ni e-''+3{eyw ^ 

Proof. To begin, note that the cgf hypothesis ()2.3p holds in the presence of (|2.4p because the 
logarithm is an operator monotone function |Bha971 Ch. V]. 

The overall proof strategy is identical with the stopping-time technique used by Freedman |Fre75j . 
Fix a positive parameter 0, which we will optimize later. Following the discussion in ' ^2.2\ we 
introduce the random process '■= Ge(lfc,Wfc). Lemma 12.11 implies that {S^} is a positive 
supermartingale with initial value d. These simple properties of the auxiliary random process 
distill all the essential information from the hypotheses of the theorem. 

Define a stopping time n by finding the first time instant k when the maximum eigenvalue of the 
partial sum process reaches the level t even though the sum of cgf bounds has maximum eigenvalue 
no larger than w. 

K := inf{fc > : Amax(^^fe) > t and AmaxlWfc) < w}. 
When the infimum is empty, the stopping time k = oo. Consider a system of exceptional events: 

Ek ■■= {Amax(^fc) > t and Amax(Wfc) <w} for A; = 0, 1, 2, 

Construct the event E := IJ^q Ek that one or more of these exceptional situations takes place. 
The intuition behind this definition is that the partial sum Yk is typically not large unless the 
process {^fc} has varied substantially, a situation that the bound on Wk disallows. As a result, 
the event E is rather unlikely. 

We are prepared to estimate the probability of the exceptional event. First, note that k < oo on 
the event E. Therefore, Lemma [2.21 provides a conditional lower bound for the process {Sk} at the 
stopping time k: 

5« = Gg(Y^, W«) > e^*"^^^^)-'" on the event E. 
Since E < d for each (finite) index k, 

Eoo f 
E[S^ \k = k]-F{K = k} = K[S^ \k<oo]> S^dF 

'^•=1 J{k<oo} 

> [ S«dP>P(£;)-infsS«>P(E)-e^*"f 
Je 

We require the fact that is positive to justify these inequalities. Rearrange the relation to obtain 

F{E) < (i-e-^*+f 

Minimize the right-hand side with respect to 6 to complete the main part of the argument. □ 

3. Proof of Freedman's Inequality 

In this section, we use the general martingale deviation bound. Theorem 12. 3|, to prove a stronger 
version of Theorem 11.21 

Theorem 3.1. Consider an adapted sequence {Xk} of self-adjoint matrices with dimension d that 
satisfy the relations 

Efc„i Xk = and Amax(-^fc) R almost surely for k = 1,2,2>, . . . . 

Define the partial sums 

Yk:=Y,^.^^Xj and Wfe := J]';^^ E,_i for k = 0,1,2, ... . 
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Then, for all t >0 and > 0, 

F{3k>0:X^,,{Yk)>t and < a^} < d • exp • | . 

The function h{u) := (1 + u) log(l + u) — u for u >0. 
Theorem 11.21 follows easily from this result. 

Proof of Theorem from Theorem \3.1[ To derive Theorem II. 2| we note that the difference se- 
quence of {Xk} a matrix martingale {Yk} satisfies the conditions of Theorem 13. H and the martin- 
gale can be expressed using partial sums of the difference sequence. Finally, we apply the numerical 
inequality 

hiu) > '—- for n > 0, 

which we obtain by comparing derivatives. □ 



3.1. Demonstration of Theorem 13.11 We conclude with the proof of Theorem 13.11 The argu- 
ment depends on the following estimate for the moment generating function of a zero-mean random 
matrix whose eigenvalues are uniformly bounded. See [TrolObj Lem. 6.7] for the proof. 

Lemma 3.2 (Freedman mgf). Suppose that X is a random self-adjoint matrix that satisfies 

EX = and X^^^{X) < 1. 

Then 

Ee^^ =^exp((e^-0-l)-E(X2)) for 9 > 0. 
The main result follows quickly from this lemma. 



Proof of Theorem \3.1\ We assume that R = 1\ the general result follows by re-scaling since Yk is 
1-homogeneous and is 2-homogeneous. Invoke Lemma 13.21 conditionally to see that 

Efc_i e^^^- =^ exp {g{e) ■ Efc„i where g(e) ■.= e' -6-1. 

Theorem 12.31 now implies that 

P > : An,ax(lfe) > t and \m.^{Wk) <a^]<d- inf ^st+m-a^ 

The infimum is achieved when 9 = log(l + t/a'^). Finally, note that the norm of a positive- 
semidefinite matrix, such as W^, equals its largest eigenvalue. □ 
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